perm filename GALLEY.TEX[TEX,DEK]1 blob sn#362417 filedate 1978-06-22 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00008 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input acphdr
C00003 00003	%folio 285 galley 7 WARNING: Much tape unreadable! (C) Addison-Wesley 1978	*
C00022 00004
C00039 00005	\vfill\eject
C00047 00006
C00060 00007
C00079 00008
C00091 ENDMK
C⊗;
\input acphdr
\runninglefthead{ARITHMETIC---FIRST PROOFS $\copyright$ 1978}
\runningrighthead{ARITHMETIC---FIRST PROOFS $\copyright$ 1978}
section{4.x}
\setcpage 701
\tenpoint
%folio 285 galley 7 WARNING: Much tape unreadable! (C) Addison-Wesley 1978	*
\runningrighthead{ACCURACY OF FLOATING-POINT ARITHMETIC}
section{4.2.2}
\sectionskip
\sectionbegin{4.2.2. Accuracy of Floating-Point Arithmetic}
Floating-point computation is by nature
inexact, and it is not difficult to misuse it so that the computed
answers consist almost entirely of ``noise.'' One of the principal
problems of numerical analysis is to determine how accurate
the results of certain numerical methods will be; a ``credibility-gap''
problem is involved here: we don't know how much of the computer's
answers to believe. Novice computer users solve this problem
by implicitly trusting in the computer as an infallible authority;
they tend to believe that all digits of a printed answer are
significant. Disillusioned computer users have just the opposite
approach, they are constantly afraid their answers are almost
meaningless. Many a serious mathematician has attempted to give
rigorous analyses of a sequence of floating-point operations,
but has found the task to be so formidable that he has tried to content
himself with plausibility arguments instead.

A thorough examination of error analysis techniques
is, of course, beyond the scope of this book, but we will in
this section study some of the characteristics of floating-point
arithmetic errors. Our goal is to discover how to perform floating-point
arithmetic in such a way that reasonable analyses of error propagation
are facilitated as much as possible.

A rough (but reasonably useful) way to express
the behavior of floating-point arithmetic can be based on the
concept of ``significant figures'' or {\sl relative error.}
If we are representing an exact real number $x$ inside a computer
by using the approximation $\A x = x(1 + ε)$, the quantity
$ε = (\A x - x)/x$ is called the relative error of approximation.
Roughly speaking, the operations of floating-point multiplication
and division do not magnify the relative error by very much;
but floating-point subtraction of nearly equal quantities (and
floating-point addition, $u \oplus v$, where $u$ is nearly equal
to $-v$) can very greatly increase the relative error. So we
have a general rule of thumb, that a substantial loss of accuracy
is expected from such additions and subtractions, but not from
multiplications and divisions. On the other hand, the situation
is somewhat paradoxical and needs to be understood properly,
since ``bad'' additions and subtractions are performed with
perfect accuracy! (See exercise 25.)

\topinsert{quoteformat{Round numbers are always false.\cr}
author{SAMUEL JOHNSON (1750)}
\quoteformat{I shall speak in round numbers, not absolutely accurate,\cr
yet not so wide from truth as to vary the result materially.\cr}
author{THOMAS JEFFERSON (1824)}}

One of the consequences of the possible unreliability
of floating-point addition is that the associative law breaks
down:
$$(u \oplus v) \oplus w ≠ u \oplus (v \oplus w),\qquad\hjust{for many }
u, v, w.\eqno (1)$$
For example,
$$\eqalign{(11111113. \oplus -11111111.) \oplus 7.5111111 ⊗= 2.0000000
\oplus 7.5111111\cr
⊗= 9.5111111;\cr
\noalign{\vskip 3pt}
11111113. \oplus (-11111111. \oplus 7.5111111) ⊗= 11111113.
\oplus -11111103.\cr
⊗= 10.000000.\cr}$$
(All examples in this section are given in eight-digit floating-decimal
arithmetic, with exponents indicated by an explicit decimal point. Recall
that, as in Section 4.2.1, the symbols $\oplus$, $\ominus$, $\otimes$,
$\odiv$ are used to stand for floating-point operations corresponding to
the exact operations $+$, $-$, $\times$, $/$.

In view of the failure of the associative law,
the comment of Mrs. La Touche which appears at the beginning
of this chapter [taken from {\sl Math.\ Gazette \bf 12} (1924),
95] makes a good deal of sense with respect to floating-point
arithmetic. Mathematical notations like ``$a↓1 + a↓2 + a↓3$''
or ``$\sum ↓{1≤k≤n} a↓k$'' are inherently based upon the assumption
of associativity, so a programmer must be especially careful
that he does not implicitly assume the validity of the associative
law.

\subsectionbegin{A. An axiomatic approach} Although
the associative law is not valid, the commutative law
$$u \oplus v = v \oplus u\eqno (2)$$
does hold, and this law can be a valuable conceptual
asset in programming and in the analysis of programs. This example
suggests that we should look for important laws which {\sl are}
satified by $\oplus$, $\ominus$, $\otimes$, and $\odiv$; it is
not unreasonable to say that {\sl floating-point routines should
be designed to preserve as many of the ordinary mathematical
laws as possible.}

Let us therefore consider some of the other basic laws
which are valid for normalized floating-point operations as
described in the previous section. First we have
RESUME EDITING HERE!
$$
|qctr \hfill u \ominus v ⊗= u \oplus -v;\eqno (3)\cr
\4skip -(u \oplus v) = -u \oplus -v;\eqno (4)
|qctr \4skip u \oplus u \otimes v = round($u \times v),\qquad
u \odiv v =$ round($u/v)$,
|qctr \9skip where round($x)$ denotes the best floating-point
approximation to $x$. We have
$$round(-x) = $-round$(x),\eqno (10)$
|qctr \4skip x ≤ y\qquad implies\qquad round($x) ≤$ round($y),\eqno
(11)$
|qctr \9skip and these fundamental relations lead to imediate
proofs of properties (2) through (8). We can also write down
several more identities:
$$$u \otimes v = v \otimes u,\qquad (-u) \otimes v = -(u \otimes
v),\qquad 1 \otimes v = v$;
|qctr \5skip u \otimes v = 0\qquad if and only if\qquad $u =
0\qquad$ or\qquad $v = 0$;
|qctr \4skip (-u) \odiv v = u \odiv (-v) = -(u \odiv v),\qquad
0 \odiv v = 0,
|qctr \4skip u \odiv 1 = u,\qquad u \odiv u = 1;
|qctr \9skip if $u ≤ v$ and $w > 0$ then $u \otimes w ≤ v \otimes
w, u \odiv w ≤ v \odiv w$, and $w \odiv u ≥ w \odiv
v$. If $u \oplus v = u + v$ then $(u \oplus v) \ominus v = u$,
and if $u \otimes v = u \times ≠ 0$ then $(u \otimes v) \odiv
v = u$, etc. We see that a good deal of regularity is present
in spite of the inexactness of the floating-point
\vfill\eject
\setcpage 801
\runninglefthead{ANSWERS TO EXERCISES---FIRST PROOFS $\copyright$ 1978}
\runningrighthead{ANSWERS TO EXERCISES---FIRST PROOFS $\copyright$ 1978}
section{4.x}
\ninepoint